Character-to-Word Attention for Word Segmentation: Research Process

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-Context Character Embeddings for Chinese Word Segmentation

Neural parsers have benefited from automatically labeled data via dependencycontext word embeddings. We investigate training character embeddings on a word-based context in a similar way, showing that the simple method significantly improves state-of-the-art neural word segmentation models, beating tritraining baselines for leveraging autosegmented data.

متن کامل

Which Is Essential for Chinese Word Segmentation: Character versus Word

This paper proposes an empirical comparison between word-based method and character-based method for Chinese word segmentation. In three Chinese word segmentation Bakeoffs, character-based method quickly rose as a mainstream technique in this field. We disclose the linguistic background and statistical feature behind this observation. Also, an empirical study between wordbased method and charac...

متن کامل

Learning Character Representations for Chinese Word Segmentation

We propose a simple yet effective semi-supervised method for improving Chinese Word Segmentation. Our method is based on learning generalizable vector and cluster representations of variable-length character sequences from large unlabeled data, which is then incorporated into a sequence labeling model with the passive-aggressive algorithm as features. We achieve state-of-the-art results on the ...

متن کامل

Chinese Word Segmentation as Character Tagging

In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach...

متن کامل

A Character-Based Joint Model for Chinese Word Segmentation

The character-based tagging approach is a dominant technique for Chinese word segmentation, and both discriminative and generative models can be adopted in that framework. However, generative and discriminative character-based approaches are significantly different and complement each other. A simple joint model combining the character-based generative model and the discriminative one is thus p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Natural Language Processing

سال: 2021

ISSN: 1340-7619,2185-8314

DOI: 10.5715/jnlp.28.688